Data Simulation

Simulating networks

We adopt a top-down approach to simulate hierarchical networks, considering various simulation parameters such as graph sparsity, noise, and the architecture of the super-level graph(s), including small-world, scale-free, and random graph networks (Watts and Strogatz 1998; Barabási and Bonabeau 2003).

Our simulations focus on basic hierarchies comprising one or two hierarchical layers. Two-layer networks mirror classical community detection on graphs, where our aim is to recover the true community labels from a given graph. Meanwhile, three-layer networks present a more intricate scenario, where the bottom layer of the hierarchy contains two levels of community structure. Here, the top level corresponds to the nodes at the uppermost layer of the hierarchy, and the middle level consists of communities nested within the top-level communities. The objective with these networks is to identify both sets of community partitions.

In each hierarchy, for fully connected networks, we initiate by simulating \(n_{\text{top}}\) top-level nodes, adhering to a directed small-world, random graph, or scale-free network architecture (Watts and Strogatz 1998; Barabási and Bonabeau 2003). In cases where the network is disconnected, we simply simulate \(n_{\text{top}}\) disconnected nodes. For networks with three hierarchical layers, we then generate a subnetwork of \(n_{\text{middle}}\) nodes from each top-layer node, adhering to the network structure utilized at the top level. If the network is fully connected, we apply a probability \(p_\text{between}\) to the nodes from different top-level communities being connected.

The final step in all hierarchies is to generate the nodes in the observed (bottom) layer of the hierarchy. For each top-layer or middle-layer node, we generate a subnetwork of \(n_{\text{bottom}}\) nodes under the same subnetwork structure as the previous layers, and we apply a probability \(p_\text{between}\) for nodes from different communities to share an edge.

Simulating gene expression

Once we simulate a hierarchical graph, we utilize this hierarchy to generate the node-feature matrix, which depicts the expression of \(N\) genes across \(p\) samples. Here, \(N\) denotes the number of nodes in the observed (bottom) layer of the hierarchy, and its range is governed by \(a^{\ell+1}<N<a\times b^\ell\), where \(\ell\) signifies the number of hierarchical layers.

We simulate the node-feature matrix using the topological order the observed level graph. We start by generating the features of nodes that have no parental input. We refer to these nodes as origin nodes. All origin nodes are simulated from a normal distribution with mean \(0\) and standard deviation \(\sigma\). All other nodes are simulated from a normal distribution centered at the mean of their parent nodes and with standard deviation \(\sigma\).

Hierarchical Commuity Detection (HCD) Overview

Our HCD method consists of two primary components:

  1. A graph autoencoder based on the architecture proposed by Salehi and Davulcu (2019) which utilizes graph attention layers such as those first indroduced by Veličković et al. (2017) (See most recent version of pseudocode for details). In our applications, we incorporate multi-head attention in all encoder and decoder layers to expand model learning capacity. The graph autoencoder module takes a set of node attributes and an adjacency matrix defining the relationships between the node as input and learns a low dimensional embedding of the network and attributes. This embedding is then used to reconstruct the node attributes and adjacency matrix under a separate loss function for each.

  2. The second component of HCD takes the embeddings generated by the autoencoder and applies a multilevel community detection process. This module is composed of \(l\) fully connected layers, each representing a level in the hierarchy. Each layer’s goal is to group the \(k_{i-1}\) nodes from the previous level into \(k_i\) communities. The number of layers in the hierarchy and the number of communities at each level are predefined parameters that need to be determined through other methods. In our applications, we use the simulation truth for each parameter so that the communitiy detection module consists of \(2\) layers where the first layer assigns the nodes at the bottom layer to the true number of communities at the middle layer. The second community detection layer assigns the nodes in the middle layer to communities in the top layer.

Datasets

We consider three sets of hierarchical networks which represent varying difficulty levels for inference:

  1. Complex networks - used for final simulation assessment - Table 1.1 - 1.3 .

  2. Intermediate networks - used for investigative model tuning and performance assessment - Table 1.4 .

  3. Simple networks - used for code implementation and debugging - Table 1.5.

Application to Intermediate Networks

A comprehensive overview of the intermediate networks is presented in Table 1.4. These networks are structured as three-layered systems, each characterized by small-world, scale-free, or random graph architectures. In contrast to the more intricate networks featured in the Complex Networks dataset, the intermediate networks exhibit a comparatively simpler configuration. Specifically, each network comprises \(5\) super layer nodes, \(15\) middle layer nodes, and \(300\) bottom layer nodes. Our primary focus in utilizing this dataset is to examine the performance of the Hierarchical Community Detection (HCD) method when applied to three-layer networks. The smaller scale of these networks facilitates a more in-depth analysis of the detected communities within the middle and upper layers of their hierarchical structures.

We apply the HCD method to each network separately using three options for the input graph corresponding to the nodes at the observed layer of the hierarchy:

We also explore various combinations of weighting the loss function across each of the aforementioned input graphs. In all cases, we ensure that the predicted number of communities in the middle or top levels of the hierarchy aligns with the ground truth of the simulation.

Evaluating performance

We evaluate the performance of our HCD method using three graph-based clustering metrics:

  1. homogeneity evaluates the degree to which each predicted community contains only data points from a single true community, indicating how well the algorithm avoids mixing different groups. Thus, homogenity tends to be high if resolved communities contain only members of the same true community.

  2. completeness assesses the extent to which all data points that belong to the same true community are correctly assigned to a single predicted community. Thus completeness is always high if all members of the same true communities end up in the same resolved community even if several true communities are allocated together.

  3. NMI is a weighted average of the previous two metrics.

For each simulation, we configure the number of communities in the middle and upper layers of the hierarchy to match the true count in each layer. Then, we evaluate the community predictions of the Hierarchical Community Detection (HCD) algorithm at these levels against the actual communities using three metrics. As a baseline, we employ the Louvain method, which utilizes hierarchical graph partitioning to maximize modularity, resulting in a single set of resolved communities. These resolved communities may align with the middle, upper, or a combination of both layers in the true hierarchy. Thus, we compute the performance metrics of the communities identified by the Louvain method against the true communities at both the upper and middle levels of the hierarchy.

Examples

Results

Comparing small world, scale free, and random graphs: small world disconnected intermediate networks

In this section, we compare HCD and louvain performance on small world, scale free, and random graph networks in this simplest/easiest scenario: 3-layer intermediate networks with origin nodes simulated from \(N(\mu_k, \sigma)\) distributions. These three hierarchies are the simplest in terms of the separability of the communities in the middle and top layers. A full summary of the networks in this dataset can be found in Table 1.4. We apply HCD with the “best” parameter settings from our previous results outlined in Report 4/15/2025 which are summarized in Table 1.6. Overall, HCD performs fairly well under these parameter settings for the scale free and small world networks (Table 1.7). However, performance on the random graph network is much lower - likely due to the lack of community structure in this network (Figure ??). For the scale free and small world networks, HCD performs similar to the Louvain method. Like the Lovain method, HCD is primarily capturing the top layer in the small world network and the middle layer in the scale free network. Overall, these performance values are lower than previously reported in Report 4/15/2025. This is may be because the data simulation procedure is different (i.e unique dists for parent nodes) and the parameter settings need to be futher tuned. In Example 11 we should that HCD is able to achieve considerably improved performance after downweighting the clustering loss further (Table 1.7).

Issues with loss function

Our loss function combines four components: (i) MSE loss on the reconstructed graph, (ii) BCE loss on the reconstructed graph, (iii) modularity for community predictions summed across all \(\ell\) hierarchical layers in the hierarchy, and (iv) the clustering loss measured as the Within Cluster Sum of squares (WCSS) for community predictions summed across all \(\ell\) hierarchical layers. Components (i - ii) apply to the graph auto encoder (GATE) and aim to ensure the learned embedding respects the original node attributes and input graph. Components (iii - iv) evaluate the quality of the community predictions where loss component (iii) evalutes their quality relative to the input graph and loss component (iv) evaluates their quality with respect to the node attributes. Currently the WCSS clustering loss component is calculated on the embeddings as follows:

\[ L_C = \sum_{\ell=1}^\mathcal{L} \frac{1}{k_{\ell-1}\cdot k_\ell} tr({\bf D}_\ell^T {\bf D}_\ell) \tag{1}\label{eq:eq1}\]

where \(\ell\) is the number of hierarchical layers, \(k_\ell\) is the number predicted communities in the \(\ell^{th}\) hierarchical graph, and where \({\bf D}_\ell \in \mathbb{R}^{q \times k_\ell}\) is a matrix of node deviations computed as:

\[ {\bf D}_\ell =\left[{\bf \tilde X}^{(\ell-1)}\right]^T - {\bf M}_\ell{\bf P}_\ell^T \tag{2}\label{eq:eq2}\]

where \({\bf \tilde X}^{(\ell - 1)} \in \mathbb{R}^{N \times q}\) is the node embedding in the preceding hierarchical layer, \({\bf P}_\ell \in \mathbb{R}^{k_{\ell-1} \times k_\ell}\) is a matrix of community assignment probabilities, and \({\bf M}_\ell \in \mathbb{R}^{q\times k_\ell}\) is matrix of community centroids. One challenge with defining the clustering loss component on the model embedding is that it creates a bias which encourages the model to shift the positions of the nodes in the embedding space so that they are closer together enabling a better score on the loss function.

To help illustrate this, we present three more examples outlined in Table 1.8 where we apply HCD to a small world network with 3 layers and origin nodes simulated from different distributions. In the first example, Clustering Loss Example 1, we apply HCD with the usual parameters settings and the clustering loss as defined above. In this example, the clustering bias is made apparent when comparing the nodes and their true communities in both the PCA and embedding spaces. As can be seen in Figure 2.2 the top layer communities are completely separated in the PC space and so should be easily detected, however, we can see that they are more ambiguous when viewed in the embedding space. The final predictions from this example lead to parts of two communities to be merged into a single super-community. In effect the model has cheated to get a better overall loss (Table 1.9, Figures 2.3 - 2.6).

To address this bias, we adjust the clustering loss function given eq 1 to compute the loss using the original node attributes. Since the node features remain unchanged through the training process the model is not encouraged to manipulate their presentations in the embedding space. To compute the loss using the original node attributes, we must know the mapping between the original \(N\) nodes to their community representations in higher hierarchical layers. This can be done through a simple multiplication of the assignment probabilities from previous layers:

\[ {\bf Q}_\ell = \prod_{i = 1}^{\mathcal{\ell - 1}}{\bf P}_i\] \[{\bf P}_1 \times {\bf P_2} \times \cdots \times {\bf P}_{\ell -1} \]

Where the operation \({\bf Q}_\ell = {\bf P}_{\ell -1 } \otimes {\bf P}_\ell\) is the dot product between matrices \({\bf P}_{\ell -1 }\) and \({\bf P}_\ell\) where the elements in \(\bf Q\) are computed as

\[q_{i,j} = {\bf p}_{\ell -1,i}^T \circ {\bf p}_{\ell,j} = \sum_{s = 1}^{k_{\ell - 1}} p_{i,s}^{(\ell - 1)} \cdot p_{s,j}^{(\ell)}\]

for \(p_{i,j}^{(\ell)} \in {\bf P}_\ell\). The vector product is akin to computing the marginal probability of assigning node \(n_i\) to community \(C_k\) in the \(\ell^{th}\) hierarchical layer summed or all possible assignments in the previous layer(s):

\[ q_{m,k}= \sum_{m = 1}^{k_{\ell - 1}}Pr\left( n_i \in C_m^{(\ell - 1)}\right) \cdot Pr\left(n_i \in C_{k}^{(\ell)}| n_i \in C_m^{(\ell - 1)} \ \right) \]

We present the results from Cluster Loss Example 2 which uses the above clustering loss computed on the original node features applied to the same network as in Cluster Loss Example 1. As can be seen in Figures 2.4 - 2.7 and Table 1.9, using the node attributes to compute the cluster loss leads to perfect clustering performance on the top layer communities, but the middle layer communities are no longer detectable. This is possible due to a known issue with k-means like optimization in which communities that are too small or nested within larger structures are undetectable.

In Cluster Loss Example 3 we again apply the HCD model with the same clustering loss component as in Cluster Loss Example 1, but this time we increase the attribute reconstruction loss to see if the bias in the model’s predictions were due to the attribute reconstruction have negligable influence on the total loss. As can be seen in Table 1.9, this did little to improve performance. However, increasing the attribute reconstruction loss component does appear to prevent overgrouping as all five communities are detected to some degree (Figures 2.5 - 2.8). This suggest that with further refinement of this parameter, the loss computed on the embeddings can detect the true communities and further provide some community-level information about intermediate layers in the hierarchy.

Using Approximate Graphs as Input

In this section, compare the performance of HCD and Louvain when the input graph is an approximation of the true graph. We apply both methods to the small world, scale free, and random graph networks used in Examples 8-11. For each application, we use the correlation matrix of the node attributes wherein correlations weaker than 0.5 are disregarded and set to zero (Section Application to Intermediate Networks). The parameter settings are provided in Table 1.10. We apply HCD with same clustering loss parameter settings as Example 11 as this settings lead to greater performance. The performance results are given in Table 1.11.

future points to investigate

1 Tables

Table 1.1: Summary statistics for all small world networks in the complex networks datset
Subgraph type Connect. type Layers StDev. Nodes per layer Edges per layer Subgraph prob. Sample size Modularity (top) Avg. node degree top Avg edges within communities (top) Avg. edges between communities (top) Modularity (middle) Avg. node degree middle Avg edges within communities (middle) Avg edges between communities (middle)
small world disc 2 0.1 (10, 63) (0, 63) 0.05 500 0.90 1.00 6.3 0.00 NA NA NA NA
small world disc 2 0.5 (10, 63) (0, 63) 0.05 500 0.90 1.00 6.3 0.00 NA NA NA NA
small world disc 3 0.1 (10, 63, 1604) (0, 63, 2011) 0.05 500 0.90 1.25 201.1 0.00 0.76 1.25 24.82 0.11
small world disc 3 0.5 (10, 63, 1604) (0, 63, 2031) 0.05 500 0.90 1.27 203.1 0.00 0.76 1.27 24.97 0.12
small world full 2 0.1 (10, 63) (45, 115) 0.05 500 0.45 1.82 6.3 0.58 NA NA NA NA
small world full 2 0.5 (10, 63) (45, 109) 0.05 500 0.48 1.73 6.3 0.51 NA NA NA NA
small world full 3 0.1 (10, 63, 1604) (45, 114, 1604) 0.05 500 0.77 1.43 199.3 3.39 0.67 1.43 24.94 0.19
small world full 3 0.5 (10, 63, 1604) (45, 111, 1604) 0.05 500 0.77 1.44 201.3 3.28 0.66 1.44 24.87 0.19
Table 1.2: Summary statistics for all scale free networks in the complex networks datset
Subgraph type Connect. type Layers StDev. Nodes per layer Edges per layer Subgraph prob. Sample size Modularity (top) Avg. node degree top Avg edges within communities (top) Avg. edges between communities (top) Modularity (middle) Avg. node degree middle Avg edges within communities (middle) Avg edges between communities (middle)
scale free disc 2 0.1 (10, 58) (0, 74) 0.05 500 0.89 1.28 7.4 0.00 NA NA NA NA
scale free disc 2 0.5 (10, 58) (0, 74) 0.05 500 0.89 1.28 7.4 0.00 NA NA NA NA
scale free disc 3 0.1 (10, 58, 1450) (0, 74, 6700) 0.05 500 0.89 4.62 670.0 0.00 0.91 4.62 107.07 0.15
scale free disc 3 0.5 (10, 58, 1450) (0, 74, 6670) 0.05 500 0.89 4.60 667.0 0.00 0.91 4.60 107.07 0.14
scale free full 2 0.1 (10, 58) (45, 120) 0.05 500 0.51 2.07 7.4 0.51 NA NA NA NA
scale free full 2 0.5 (10, 58) (45, 120) 0.05 500 0.51 2.07 7.4 0.51 NA NA NA NA
scale free full 3 0.1 (10, 58, 1450) (45, 123, 1450) 0.05 500 0.85 4.78 665.9 3.03 0.88 4.78 107.07 0.22
scale free full 3 0.5 (10, 58, 1450) (45, 122, 1450) 0.05 500 0.85 4.84 671.4 3.42 0.86 4.84 107.07 0.25
Table 1.3: Summary statistics for all random graph networks in the complex networks datset
Subgraph type Connect. type Layers StDev. Nodes per layer Edges per layer Subgraph prob. Sample size Modularity (top) Avg. node degree top Avg edges within communities (top) Avg. edges between communities (top) Modularity (middle) Avg. node degree middle Avg edges within communities (middle) Avg edges between communities (middle)
random graph disc 2 0.1 (10, 45) (0, 32) 0.05 500 0.88 0.71 3.2 0.00 NA NA NA NA
random graph disc 2 0.5 (10, 45) (0, 32) 0.05 500 0.88 0.71 3.2 0.00 NA NA NA NA
random graph disc 3 0.1 (10, 45, 725) (0, 32, 678) 0.05 500 0.89 0.94 67.8 0.00 0.78 0.94 12.16 0.07
random graph disc 3 0.5 (10, 45, 725) (0, 32, 665) 0.05 500 0.88 0.92 66.5 0.00 0.80 0.92 12.22 0.06
random graph full 2 0.1 (10, 45) (45, 77) 0.05 500 0.31 1.71 3.2 0.50 NA NA NA NA
random graph full 2 0.5 (10, 45) (45, 77) 0.05 500 0.31 1.71 3.2 0.50 NA NA NA NA
random graph full 3 0.1 (10, 45, 725) (45, 78, 725) 0.05 500 0.76 1.04 65.5 1.10 0.70 1.04 12.18 0.10
random graph full 3 0.5 (10, 45, 725) (45, 78, 725) 0.05 500 0.72 1.09 65.7 1.48 0.67 1.09 12.16 0.12
Table 1.4: Summary statistics for intermediate difficulty simulated networks.
Subgraph type Connect. type Layers StDev. Nodes per layer Edges per layer Subgraph prob. Sample size Modularity (top) Avg. node degree top Avg edges within communities (top) Avg. edges between communities (top) Modularity (middle) Avg. node degree middle Avg edges within communities (middle) Avg edges between communities (middle)
small world disc 3 0.1 (5, 15, 300) (0, 15, 363) 0.05 500 0.80 1.21 72.6 0.00 0.76 1.21 20.00 0.30
small world full 3 0.1 (5, 15, 300) (10, 25, 300) 0.05 500 0.68 1.35 70.8 2.50 0.68 1.35 20.00 0.50
scale free disc 3 0.1 (5, 15, 300) (0, 10, 975) 0.05 500 0.78 3.25 195.0 0.00 0.86 3.25 61.33 0.26
scale free full 3 0.1 (5, 15, 300) (10, 20, 300) 0.05 500 0.73 3.35 191.4 2.40 0.84 3.35 61.33 0.41
random graph disc 3 0.1 (5, 12, 167) (0, 7, 138) 0.05 500 0.79 0.83 27.6 0.00 0.76 0.83 9.67 0.17
random graph full 3 0.1 (5, 12, 167) (10, 17, 167) 0.05 500 0.64 0.92 26.2 1.15 0.67 0.92 9.75 0.28
Table 1.5: Summary statistics for simple simulated networks. These networks contain fewer than 100 nodes at the observed level and only cover small world subgraph architecture.
Subgraph type Connect. type Layers StDev. Nodes per layer Edges per layer Subgraph prob. Sample size Modularity (top) Avg. node degree top Avg edges within communities (top) Avg. edges between communities (top) Modularity (middle) Avg. node degree middle Avg edges within communities (middle) Avg edges between communities (middle)
small world disc 2 0.1 (2, 6) (0, 6) 0.05 500 0.50 1.00 3 0.0 NA NA NA NA
small world disc 3 0.1 (2, 6, 18) (0, 6, 24) 0.05 500 0.50 1.33 12 0.0 0.58 1.33 3 0.20
small world full 2 0.1 (2, 6) (1, 7) 0.05 500 0.36 1.17 3 0.5 NA NA NA NA
small world full 3 0.1 (2, 6, 18) (1, 7, 18) 0.05 500 0.46 1.39 12 0.5 0.55 1.39 3 0.23
Table 1.6: HCD settings for Examples 8-11.
Parameter Example 8 Example 9 Example 10 Example 11
Hidden Layer Dimensions: [256, 128, 64, 32] [256, 128, 64, 32] [256, 128, 64, 32] [256, 128, 64, 32]
X_loss: 0.0001 0.0001 0.0001 0.0001
Modularity_loss: 0.001 0.001 0.001 0.001
Clustering_loss: [0.001, 0.0001] [0.001, 0.0001] [0.001, 0.0001] [0.0001, 1e-05]
Learning_rate: 1e-05 1e-05 1e-05 1e-05
True Comms: True True True True
Epochs: 500 500 500 500
Multi Head Attn: True True True True
Use Attention: True True True True
Attn. Heads: 10 10 10 10
Normalize_inputs: True True True True
Input Graph True Graph True Graph True Graph True Graph
Layers 3 3 3 3
Network Type Scale Free/Disconnected Random Graph/Disconnected Small World/Disconnected Small World/Disconnected
Dataset Intermediate Networks Intermediate Networks Intermediate Networks Intermediate Networks
Origin Node Dist. \(N(\mu_k, \sigma)\) \(N(\mu_k, \sigma)\) \(N(\mu_k, \sigma)\) \(N(\mu_k, \sigma)\)
Network ID: 3L-SF-DiffDist-Inter 3L-SMW-DiffDist-Inter 3L-RG-DiffDist-Inter 3L-SMW-DiffDist-Inter
Table 1.7: Performance metrics for Louvain method and HCD in Examples 8-11. For details regarding parameter settings see Examples 8-11 under section Examples. All networks are disconnected with 3 layers and all origin nodes are simulated from a normal distribution with unique center.
Comparison Homogeneity Completeness NMI Comparison Homogeneity Completeness NMI
Example 8: Scale Free Example 10: Small World
louvain vs middle 0.8878 0.9891 0.9357 louvain vs middle 0.6508 0.9038 0.7568
louvain vs top 1 0.6622 0.7968 louvain vs top 1 0.8253 0.9043
HCD: middle vs middle 0.8665 0.8981 0.882 HCD: middle vs middle 0.5298 0.6337 0.5771
HCD: middle vs top 0.8759 0.5396 0.6678 HCD: middle vs top 0.8145 0.579 0.6769
HCD: top vs top 0.5321 0.6191 0.5723 HCD: top vs top 0.7768 0.9269 0.8452
Example 9: Random Graph Example 11: Small World
louvain vs middle 0.8125 0.6267 0.7076 louvain vs middle 0.6518 0.8342 0.7318
louvain vs top 1 0.4969 0.6639 louvain vs top 1 0.7606 0.864
HCD: middle vs middle 0.506 0.5239 0.5147 HCD: middle vs middle 0.6699 0.6953 0.6824
HCD: middle vs top 0.6389 0.4262 0.5113 HCD: middle vs top 1 0.6169 0.7631
HCD: top vs top 0.4352 0.4434 0.4392 HCD: top vs top 1 1 1
Table 1.8: HCD settings for Loss Examples 1-3.
Parameter Clustering Loss Example 1 Clustering Loss Example 2 Clustering Loss Example 3
Hidden Layer Dimensions: [256, 128, 64, 32] [256, 128, 64, 32] [256, 128, 64, 32]
X_loss: 0.0001 0.0001 0.001
Modularity_loss: 0.001 0.001 0.001
Clustering_loss: [0.001, 0.0001] [0.001, 0.0001] [0.001, 0.0001]
Learning_rate: 1e-05 1e-05 1e-05
Comms Layers: [15, 5] [15, 5] [15, 5]
Epochs: 500 500 500
Multi Head Attn: True True True
Use Attention: True True True
Attn. Heads: 20 20 20
Normalize_inputs: True True True
Input Graph True Graph True Graph True Graph
Layers 3 3 3
Network Type Small World/Disconnected Small World/Disconnected Small World/Disconnected
Dataset Intermediate Networks Intermediate Networks Intermediate Networks
Origin Node Dist. \(N(\mu_k, \sigma)\) \(N(\mu_k, \sigma)\) \(N(\mu_k, \sigma)\)
Network ID: Special-3L-SMW-DiffDist-Inter Special-3L-SMW-DiffDist-Inter Special-3L-SMW-DiffDist-Inter
Table 1.9: Performance metrics for Louvain method and HCD in Loss Examples 1 - 3. In these examples, two versions of the clustering loss component of HCD are compared on a small world 3 layer disconnected network with unique parent nodes (i.e simulated from different distributions). For specific details regarding the parameter settings see Loss Examples 1 - 3 under section Examples. The same network is used in both examples and HCD and the Louvain method are applied to the same data. In example 1, the clustering loss is computed using the node embeddings. In example 2, the clustering loss is computed using the original node attributes.
Clustering Loss Example 1
Clustering Loss Example 2
Clustering Loss Example 3
Comparison Homogeneity Completeness NMI Homogeneity Completeness NMI Homogeneity Completeness NMI
louvain vs middle 0.744 0.938 0.830 0.705 0.895 0.788 0.688 0.849 0.760
louvain vs top 1.000 0.749 0.856 1.000 0.754 0.860 1.000 0.734 0.847
HCD: middle vs middle 0.657 0.677 0.667 0.644 0.947 0.767 0.640 0.729 0.682
HCD: middle vs top 0.916 0.561 0.696 1.000 0.874 0.932 0.902 0.610 0.728
HCD: top vs top 0.782 0.817 0.799 1.000 1.000 1.000 0.810 0.829 0.820
Table 1.10: HCD settings for Examples 12-14.
Parameter Example 12 Example 13 Example 14
Hidden Layer Dimensions: [256, 128, 64, 32] [256, 128, 64, 32] [256, 128, 64, 32]
X_loss: 0.0001 0.0001 0.0001
Modularity_loss: 0.001 0.001 0.001
Clustering_loss: [0.001, 0.0001] [0.001, 0.0001] [0.001, 0.0001]
Learning_rate: 1e-05 1e-05 1e-05
True Comms: True True True
Epochs: 500 500 500
Multi Head Attn: True True True
Use Attention: True True True
Attn. Heads: 10 10 10
Normalize_inputs: True True True
Input Graph 0.5 Corr. Graph 0.5 Corr. Graph 0.5 Corr. Graph
Layers 3 3 3
Network Type Small World/Disconnected Scale Free/Disconnected Random Graph/Disconnected
Dataset Intermediate Networks Intermediate Networks Intermediate Networks
Origin Node Dist. \(N(\mu_k, \sigma)\) \(N(\mu_k, \sigma)\) \(N(\mu_k, \sigma)\)
Network ID: 3L-RG-DiffDist-Inter 3L-SF-DiffDist-Inter 3L-SMW-DiffDist-Inter
Table 1.11: Performance metrics for Louvain method and HCD in Examples 12-14. For details regarding parameter settings see Examples 12-14 under section Examples. All networks are disconnected with 3 layers and all origin nodes are simulated from a normal distribution with unique center.
Comparison Homogeneity Completeness NMI Comparison Homogeneity Completeness NMI
Example 12: Small World Example 14: Random Graph
louvain vs middle 0.642 0.8676 0.7379 louvain vs middle 0.9539 0.5499 0.6976
louvain vs top 1 0.8031 0.8908 louvain vs top 1 0.3714 0.5417
HCD: middle vs middle 0.6583 0.7751 0.7119 HCD: middle vs middle 0.3324 0.3394 0.3359
HCD: middle vs top 0.9683 0.6776 0.7973 HCD: middle vs top 0.264 0.1736 0.2095
HCD: top vs top 0.8249 0.8276 0.8262 HCD: top vs top 0.1244 0.1406 0.132
Example 13: Scale Free
louvain vs middle 0.8347 0.8929 0.8628
louvain vs top 1 0.6357 0.7773
HCD: middle vs middle 0.8353 0.8708 0.8527
HCD: middle vs top 0.9097 0.5636 0.696
HCD: top vs top 0.4856 0.5327 0.5081

2 Figures

Prediction performance for Examples 8-10

Figure 2.1: Prediction performance for Examples 8-10

TSNE/PCA of nodes attribute matrix. Colors indicate true community assignments at the top layer of the hierarchy

Figure 2.2: TSNE/PCA of nodes attribute matrix. Colors indicate true community assignments at the top layer of the hierarchy

TSNE/PCA of node embeddings at the bottleneck layer of GATE (top) and the first hierarchical layer (bottom). Colors indicate the predicted community assignments at the top layer of the hierarchy for Clustering Loss Example 1

Figure 2.3: TSNE/PCA of node embeddings at the bottleneck layer of GATE (top) and the first hierarchical layer (bottom). Colors indicate the predicted community assignments at the top layer of the hierarchy for Clustering Loss Example 1

TSNE/PCA of node embeddings at the bottleneck layer of GATE (top) and the first hierarchical layer (bottom). Colors indicate the predicted community assignments at the top layer of the hierarchy for Clustering Loss Example 2

Figure 2.4: TSNE/PCA of node embeddings at the bottleneck layer of GATE (top) and the first hierarchical layer (bottom). Colors indicate the predicted community assignments at the top layer of the hierarchy for Clustering Loss Example 2

TSNE/PCA of node embeddings at the bottleneck layer of GATE (top) and the first hierarchical layer (bottom). Colors indicate the predicted community assignments at the top layer of the hierarchy for Clustering Loss Example 3

Figure 2.5: TSNE/PCA of node embeddings at the bottleneck layer of GATE (top) and the first hierarchical layer (bottom). Colors indicate the predicted community assignments at the top layer of the hierarchy for Clustering Loss Example 3

Prediction performance for Clustgering Loss Example 1

Figure 2.6: Prediction performance for Clustgering Loss Example 1

Prediction performance for Clustering Loss Example 2

Figure 2.7: Prediction performance for Clustering Loss Example 2

Prediction performance for Clustering Loss Example 2

Figure 2.8: Prediction performance for Clustering Loss Example 2

True and predicted adjacency matrices for examples 12 - 14

Figure 2.9: True and predicted adjacency matrices for examples 12 - 14

Heatmaps showing HCD prediction performance for examples 12 - 14

Figure 2.10: Heatmaps showing HCD prediction performance for examples 12 - 14

Correlations on the simulated data and HCD model bottleneck dimension embeddings for examples 12-14

Figure 2.11: Correlations on the simulated data and HCD model bottleneck dimension embeddings for examples 12-14

Loss curves for examples 12 - 14

Figure 2.12: Loss curves for examples 12 - 14

3 References

Barabási, Albert-László, and Eric Bonabeau. 2003. “Scale-Free Networks.” Scientific American 288 (5): 60–69.
Salehi, Amin, and Hasan Davulcu. 2019. “Graph Attention Auto-Encoders.” arXiv Preprint arXiv:1905.10715.
Veličković, Petar, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. “Graph Attention Networks.” arXiv Preprint arXiv:1710.10903.
Watts, Duncan J, and Steven H Strogatz. 1998. “Collective Dynamics of ‘Small-World’networks.” Nature 393 (6684): 440–42.